12 research outputs found

    Shared-Memory Parallel Maximal Clique Enumeration

    Get PDF
    We present shared-memory parallel methods for Maximal Clique Enumeration (MCE) from a graph. MCE is a fundamental and well-studied graph analytics task, and is a widely used primitive for identifying dense structures in a graph. Due to its computationally intensive nature, parallel methods are imperative for dealing with large graphs. However, surprisingly, there do not yet exist scalable and parallel methods for MCE on a shared-memory parallel machine. In this work, we present efficient shared-memory parallel algorithms for MCE, with the following properties: (1) the parallel algorithms are provably work-efficient relative to a state-of-the-art sequential algorithm (2) the algorithms have a provably small parallel depth, showing that they can scale to a large number of processors, and (3) our implementations on a multicore machine shows a good speedup and scaling behavior with increasing number of cores, and are substantially faster than prior shared-memory parallel algorithms for MCE.Comment: 10 pages, 3 figures, proceedings of the 25th IEEE International Conference on. High Performance Computing, Data, and Analytics (HiPC), 201

    Butterfly Counting in Bipartite Networks

    Full text link
    We consider the problem of counting motifs in bipartite affiliation networks, such as author-paper, user-product, and actor-movie relations. We focus on counting the number of occurrences of a "butterfly", a complete 2×22 \times 2 biclique, the simplest cohesive higher-order structure in a bipartite graph. Our main contribution is a suite of randomized algorithms that can quickly approximate the number of butterflies in a graph with a provable guarantee on accuracy. An experimental evaluation on large real-world networks shows that our algorithms return accurate estimates within a few seconds, even for networks with trillions of butterflies and hundreds of millions of edges.Comment: 28 pages, 5 tables, 6 figure

    Graphlet counting in massive networks

    No full text
    Graphs are a standard tool for deriving a flexible abstraction of interactions among entities and are extensively employed in a myriad variety of domains from various disciplines, including bioinformatics, biochemistry, social sciences, and neurobiology. These, and many more ubiquitous domains, can be modeled as graphs (also called networks), which capture interactions (i.e., edges) and discrete entities (i.e., vertices). Understanding the underlying structure of complex networks is a typical data-mining task. Widespread research works have shown that the frequency distribution of small subgraphs (also known as motifs or graphlets) is an effective tool to analyze complex networks. Graph motifs such as triangle (a cycle of size three), diamond (two triangles with one vertex in common), butterfly (a (2,2)-biclique), and kk-clique (for k6k \leq 6) are indeed regarded as building blocks to understand the true structure of an underlying network. Indeed, graph motifs resemble cohesion and furnish analytic insights into the heart of real-world networks. Due to the surge of data, further fueled by the inevitable combinatorial explosion, counting motifs turns out to be a challenging task in large-scale networks. A possible approach to cope with the ever-increasing cost of counting in large graphs is to approximate the number of motifs through a sampling mechanism. The mainstream existing approximation approaches and suchlike provide an estimation of motif count with a considerable reduction in runtime, tailoring these methods to accomplish counting in planetary-scale networks. The demand for approximate approaches is further nurtured by its numerous real-world applications, which do not require an exact count of motifs while an accurate but faster approximation will suffice. Consequently, several approximation approaches have been designed to estimate triangle and butterfly count, many in static graphs, and far fewer in the streaming network, which the network is dynamically changed upon receiving edge insertions/deletions. The main focus of this dissertation is approximate approaches in static and streaming networks for graph motif counting, more specifically triangle and butterfly counting. We discussed several randomized algorithms and presented the empirical evaluations for several algorithms. We compared the runtime and accuracy of existing work for triangle counting in static and streaming networks in great detail. Further, we described our approximation algorithms for butterfly counting. The experiments for butterfly counting show that our algorithms outperform the existing methods in terms of runtime and accuracy

    Graphlet counting in massive networks

    No full text
    Graphs are a standard tool for deriving a flexible abstraction of interactions among entities and are extensively employed in a myriad variety of domains from various disciplines, including bioinformatics, biochemistry, social sciences, and neurobiology. These, and many more ubiquitous domains, can be modeled as graphs (also called networks), which capture interactions (i.e., edges) and discrete entities (i.e., vertices). Understanding the underlying structure of complex networks is a typical data-mining task. Widespread research works have shown that the frequency distribution of small subgraphs (also known as motifs or graphlets) is an effective tool to analyze complex networks. Graph motifs such as triangle (a cycle of size three), diamond (two triangles with one vertex in common), butterfly (a (2,2)-biclique), and kk-clique (for k6k \leq 6) are indeed regarded as building blocks to understand the true structure of an underlying network. Indeed, graph motifs resemble cohesion and furnish analytic insights into the heart of real-world networks. Due to the surge of data, further fueled by the inevitable combinatorial explosion, counting motifs turns out to be a challenging task in large-scale networks. A possible approach to cope with the ever-increasing cost of counting in large graphs is to approximate the number of motifs through a sampling mechanism. The mainstream existing approximation approaches and suchlike provide an estimation of motif count with a considerable reduction in runtime, tailoring these methods to accomplish counting in planetary-scale networks. The demand for approximate approaches is further nurtured by its numerous real-world applications, which do not require an exact count of motifs while an accurate but faster approximation will suffice. Consequently, several approximation approaches have been designed to estimate triangle and butterfly count, many in static graphs, and far fewer in the streaming network, which the network is dynamically changed upon receiving edge insertions/deletions. The main focus of this dissertation is approximate approaches in static and streaming networks for graph motif counting, more specifically triangle and butterfly counting. We discussed several randomized algorithms and presented the empirical evaluations for several algorithms. We compared the runtime and accuracy of existing work for triangle counting in static and streaming networks in great detail. Further, we described our approximation algorithms for butterfly counting. The experiments for butterfly counting show that our algorithms outperform the existing methods in terms of runtime and accuracy

    Graphlet counting in massive networks

    Get PDF
    Graphs are a standard tool for deriving a flexible abstraction of interactions among entities and are extensively employed in a myriad variety of domains from various disciplines, including bioinformatics, biochemistry, social sciences, and neurobiology. These, and many more ubiquitous domains, can be modeled as graphs (also called networks), which capture interactions (i.e., edges) and discrete entities (i.e., vertices). Understanding the underlying structure of complex networks is a typical data-mining task. Widespread research works have shown that the frequency distribution of small subgraphs (also known as motifs or graphlets) is an effective tool to analyze complex networks. Graph motifs such as triangle (a cycle of size three), diamond (two triangles with one vertex in common), butterfly (a (2,2)-biclique), and kk-clique (for k6k \leq 6) are indeed regarded as building blocks to understand the true structure of an underlying network. Indeed, graph motifs resemble cohesion and furnish analytic insights into the heart of real-world networks. Due to the surge of data, further fueled by the inevitable combinatorial explosion, counting motifs turns out to be a challenging task in large-scale networks. A possible approach to cope with the ever-increasing cost of counting in large graphs is to approximate the number of motifs through a sampling mechanism. The mainstream existing approximation approaches and suchlike provide an estimation of motif count with a considerable reduction in runtime, tailoring these methods to accomplish counting in planetary-scale networks. The demand for approximate approaches is further nurtured by its numerous real-world applications, which do not require an exact count of motifs while an accurate but faster approximation will suffice. Consequently, several approximation approaches have been designed to estimate triangle and butterfly count, many in static graphs, and far fewer in the streaming network, which the network is dynamically changed upon receiving edge insertions/deletions. The main focus of this dissertation is approximate approaches in static and streaming networks for graph motif counting, more specifically triangle and butterfly counting. We discussed several randomized algorithms and presented the empirical evaluations for several algorithms. We compared the runtime and accuracy of existing work for triangle counting in static and streaming networks in great detail. Further, we described our approximation algorithms for butterfly counting. The experiments for butterfly counting show that our algorithms outperform the existing methods in terms of runtime and accuracy

    Shared-memory Parallel Maximal Clique Enumeration from Static and Dynamic Graphs

    No full text
    Maximal Clique Enumeration (MCE) is a fundamental graph mining problem and is useful as a primitive in identifying dense structures in a graph. Due to the high computational cost of MCE, parallel methods are imperative for dealing with large graphs. We present shared-memory parallel algorithms for MCE, with the following properties: (1) the parallel algorithms are provably work-efficient relative to a state-of-the-art sequential algorithm, (2) the algorithms have a provably small parallel depth, showing they can scale to a large number of processors, and (3) our implementations on a multicore machine show good speedup and scaling behavior with increasing number of cores and are substantially faster than prior shared-memory parallel algorithms for MCE; for instance, on certain input graphs, while prior works either ran out of memory or did not complete in five hours, our implementation finished within a minute using 32 cores. We also present work-efficient parallel algorithms for maintaining the set of all maximal cliques in a dynamic graph that is changing through the addition of edges.This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published as Das, Apurba, Seyed-Vahid Sanei-Mehri, and Srikanta Tirthapura. "Shared-memory parallel maximal clique enumeration from static and dynamic graphs." ACM Transactions on Parallel Computing (TOPC) 7, no. 1 (2020): 1-28. DOI: 10.1145/3380936. Copyright 2020 Association for Computing Machinery. Posted with permission

    Butterfly Counting in Bipartite Networks

    No full text
    We consider the problem of counting motifs in bipartite affiliation networks, such as author-paper, user-product, and actor-movie relations. We focus on counting the number of occurrences of a "butterfly", a complete 2x2 biclique, the simplest cohesive higher-order structure in a bipartite graph. Our main contribution is a suite of randomized algorithms that can quickly approximate the number of butterflies in a graph with a provable guarantee on accuracy. An experimental evaluation on large real-world networks shows that our algorithms return accurate estimates within a few seconds, even for networks with trillions of butterflies and hundreds of millions of edges.This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published as: Sanei-Mehri, Seyed-Vahid, Ahmet Erdem Sariyuce, and Srikanta Tirthapura. "Butterfly counting in bipartite networks." In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2150-2159. 2018. DOI: 10.1145/3219819.3220097. Copyright 2018 Association for Computing Machinery. Posted with permission

    Shared-Memory Parallel Maximal Clique Enumeration

    No full text
    We present shared-memory parallel methods for Maximal Clique Enumeration (MCE) from a graph. MCE is a fundamental and well-studied graph analytics task, and is a widely used primitive for identifying dense structures in a graph. Due to its computationally intensive nature, parallel methods are imperative for dealing with large graphs. However, surprisingly, there do not yet exist scalable and parallel methods for MCE on a shared-memory parallel machine. In this work, we present efficient shared-memory parallel algorithms for MCE, with the following properties: (1) the parallel algorithms are provably work-efficient relative to a state-of-the-art sequential algorithm (2) the algorithms have a provably small parallel depth, showing that they can scale to a large number of processors, and (3) our implementations on a multicore machine shows a good speedup and scaling behavior with increasing number of cores, and are substantially faster than prior shared-memory parallel algorithms for MCE.This is a manuscript published as Das, Apurba, Seyed-Vahid Sanei-Mehri, and Srikanta Tirthapura. "Shared-Memory Parallel Maximal Clique Enumeration." arXiv preprint arXiv:1807.09417 (2018).</p
    corecore